Linear Classification
two major components: a score function that maps the raw data to class scores, and a loss function that quantifies the agreement between the predicted scores and the ground truth labels. We will then cast this as an optimization problem in which we will minimize the loss function with respect to the parameters of the score function.
Parameterized mapping from images to label scores
score function that maps the pixel values of an image to confidence scores for each class. let’s assume a training dataset of images , each associated with a label . Here and . That is, we have N examples (each with a dimensionality of D) and K distinct categories.
For example, in CIFAR-10 we have a training set of N = 50,000 images, each with D = 32 x 32 x 3 = 3072 pixels, and K = 10, since there are 10 distinct classes (dog, cat, car, etc). We will now define the score function f:RD↦RK that maps the raw image pixels to class scores.
simplest possible function, a linear mapping: